Conversation
There was a problem hiding this comment.
Pull request overview
This pull request refactors the retry logic in the runStreamingTest function to implement a manual per-attempt timeout pattern, replacing the previous shared timeout approach. The changes also remove redundant cleanup code from TestE2EInvokeInlineAgentWithStreaming.
Changes:
- Replaced
retry.OnErrorwith a manual retry loop using exponential backoff for streaming tests - Changed from a single 30-second timeout across all attempts to a 10-second timeout per attempt
- Removed redundant deferred cleanup in
TestE2EInvokeInlineAgentWithStreaming(cleanup is already handled viat.Cleanupin setup functions)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
go/test/e2e/invoke_api_test.go
Outdated
| ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) | ||
| stream, err := a2aClient.StreamMessage(ctx, protocol.SendMessageParams{Message: msg}) | ||
| if err != nil { | ||
| cancel() | ||
| lastErr = err | ||
| continue | ||
| } | ||
| if msgResult.Status.Message != nil { | ||
| text += a2a.ExtractText(*msgResult.Status.Message) | ||
|
|
||
| resultList := []protocol.StreamingMessageEvent{} | ||
| var text string | ||
| for event := range stream { | ||
| msgResult, ok := event.Result.(*protocol.TaskStatusUpdateEvent) | ||
| if !ok { | ||
| continue | ||
| } | ||
| if msgResult.Status.Message != nil { | ||
| text += a2a.ExtractText(*msgResult.Status.Message) | ||
| } | ||
| resultList = append(resultList, event) | ||
| } | ||
| resultList = append(resultList, event) | ||
| cancel() |
There was a problem hiding this comment.
Context cancellation timing issue: The context is cancelled immediately after consuming the stream channel (line 284), but it should be deferred to ensure it's cancelled even if an error occurs during stream consumption. The current pattern doesn't follow Go best practices for context cancellation. Move the cancel() call to a defer statement immediately after line 264 to ensure proper cleanup. This is the pattern used elsewhere in the codebase (e.g., go/cli/internal/cli/agent/invoke.go:102-103).
go/test/e2e/invoke_api_test.go
Outdated
| backoff := defaultRetry.Duration * time.Duration(1<<(attempt-1)) | ||
| t.Logf("streaming attempt %d failed, retrying in %v: %v", attempt, backoff, lastErr) | ||
| time.Sleep(backoff) |
There was a problem hiding this comment.
The manual backoff calculation defaultRetry.Duration * time.Duration(1<<(attempt-1)) is a simple exponential backoff but doesn't use the Factor and Jitter fields from defaultRetry (lines 181-186). The defaultRetry config specifies Factor: 2.0 and Jitter: 0.2, but this implementation ignores both. Either use these fields in the calculation (backoff = Duration * Factor^(attempt-1) with jitter), or consider using wait.ExponentialBackoff or wait.ExponentialBackoffWithContext which properly implements the backoff strategy including jitter.
go/test/e2e/invoke_api_test.go
Outdated
| for event := range stream { | ||
| msgResult, ok := event.Result.(*protocol.TaskStatusUpdateEvent) | ||
| if !ok { | ||
| ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) |
There was a problem hiding this comment.
The per-attempt timeout has been reduced from 30 seconds (original shared timeout across all attempts) to 10 seconds per attempt. While this provides a clearer timeout boundary per attempt, verify that 10 seconds is sufficient for streaming operations in your test environment. If the streaming response legitimately takes longer than 10 seconds, this could cause flaky test failures.
| ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) | |
| ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) |
Signed-off-by: Jet Chiang <pokyuen.jetchiang-ext@solo.io>
15390ab to
55fdd34
Compare
Sharing a timeout context in all streaming retry attempts is causing trouble, it is now replaced by a fresh context for each retry. The agent not found error is caused by double deleting agents (not the root cause but fixed). Close #1328
Running
go test -v github.com/kagent-dev/kagent/go/test/e2e -failfast -count=10 -run TestE2EInvokeInlineAgentWithStreamingdid not fail (it previously did).